Four Corners
- Asia > Singapore (0.04)
- North America > United States > Oregon > Marion County > Four Corners (0.04)
- Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)
- Education (0.69)
- Information Technology > Software (0.46)
The Marine Debris Forward-Looking Sonar Datasets
Valdenegro-Toro, Matias, Padmanabhan, Deepan Chakravarthi, Singh, Deepak, Wehbe, Bilal, Petillot, Yvan
Sonar sensing is fundamental for underwater robotics, but limited by capabilities of AI systems, which need large training datasets. Public data in sonar modalities is lacking. This paper presents the Marine Debris Forward-Looking Sonar datasets, with three different settings (watertank, turntable, flooded quarry) increasing dataset diversity and multiple computer vision tasks: object classification, object detection, semantic segmentation, patch matching, and unsupervised learning. We provide full dataset description, basic analysis and initial results for some tasks. We expect the research community will benefit from this dataset, which is publicly available at https://doi.org/10.5281/zenodo.15101686
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
- Europe > Germany > Bremen > Bremen (0.04)
- North America > United States > Oregon > Marion County > Four Corners (0.04)
- (2 more...)
VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Lin, Kevin Qinghong, Li, Linjie, Gao, Difei, WU, Qinchen, Yan, Mingyi, Yang, Zhengyuan, Wang, Lijuan, Shou, Mike Zheng
Graphical User Interface (GUI) automation holds significant promise for enhancing human productivity by assisting with computer tasks. Existing task formulations primarily focus on simple tasks that can be specified by a single, language-only instruction, such as "Insert a new slide." In this work, we introduce VideoGUI, a novel multi-modal benchmark designed to evaluate GUI assistants on visual-centric GUI tasks. Sourced from high-quality web instructional videos, our benchmark focuses on tasks involving professional and novel software (e.g., Adobe Photoshop or Stable Diffusion WebUI) and complex activities (e.g., video editing). VideoGUI evaluates GUI assistants through a hierarchical process, allowing for identification of the specific levels at which they may fail: (i) high-level planning: reconstruct procedural subtasks from visual conditions without language descriptions; (ii) middle-level planning: generate sequences of precise action narrations based on visual state (i.e., screenshot) and goals; (iii) atomic action execution: perform specific actions such as accurately clicking designated elements. For each level, we design evaluation metrics across individual dimensions to provide clear signals, such as individual performance in clicking, dragging, typing, and scrolling for atomic action execution. Our evaluation on VideoGUI reveals that even the SoTA large multimodal model GPT4o performs poorly on visual-centric GUI tasks, especially for high-level planning.
- North America > United States > Oregon > Marion County > Four Corners (0.04)
- Asia > Singapore (0.04)
- Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)
- Research Report (0.82)
- Instructional Material > Course Syllabus & Notes (0.61)
- Education > Educational Technology > Audio & Video (0.71)
- Education > Educational Technology > Media (0.61)
- Information Technology > Graphics (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Generative Interpretation
Arbel, Yonathan A., Hoffman, David
We introduce generative interpretation, a new approach to estimating contractual meaning using large language models. As AI triumphalism is the order of the day, we proceed by way of grounded case studies, each illustrating the capabilities of these novel tools in distinct ways. Taking well-known contracts opinions, and sourcing the actual agreements that they adjudicated, we show that AI models can help factfinders ascertain ordinary meaning in context, quantify ambiguity, and fill gaps in parties' agreements. We also illustrate how models can calculate the probative value of individual pieces of extrinsic evidence. After offering best practices for the use of these models given their limitations, we consider their implications for judicial practice and contract theory. Using LLMs permits courts to estimate what the parties intended cheaply and accurately, and as such generative interpretation unsettles the current interpretative stalemate. Their use responds to efficiency-minded textualists and justice-oriented contextualists, who argue about whether parties will prefer cost and certainty or accuracy and fairness. Parties--and courts--would prefer a middle path, in which adjudicators strive to predict what the contract really meant, admitting just enough context to approximate reality while avoiding unguided and biased assimilation of evidence. As generative interpretation offers this possibility, we argue it can become the new workhorse of contractual interpretation.
- North America > United States > New York (0.05)
- North America > United States > California (0.04)
- North America > United States > Pennsylvania (0.04)
- (13 more...)
- Research Report (1.00)
- Overview (0.92)
- Law > Litigation (1.00)
- Health & Medicine (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- (5 more...)
Self-Supervised Learning based on Heat Equation
Chen, Yinpeng, Dai, Xiyang, Chen, Dongdong, Liu, Mengchen, Yuan, Lu, Liu, Zicheng, Lin, Youzuo
This paper presents a new perspective of self-supervised learning based on extending heat equation into high dimensional feature space. In particular, we remove time dependence by steady-state condition, and extend the remaining 2D Laplacian from x--y isotropic to linear correlated. Furthermore, we simplify it by splitting x and y axes as two first-order linear differential equations. Such simplification explicitly models the spatial invariance along horizontal and vertical directions separately, supporting prediction across image blocks. This introduces a very simple masked image modeling (MIM) method, named QB-Heat. QB-Heat leaves a single block with size of quarter image unmasked and extrapolates other three masked quarters linearly. It brings MIM to CNNs without bells and whistles, and even works well for pre-training light-weight networks that are suitable for both image classification and object detection without fine-tuning. Compared with MoCo-v2 on pre-training a Mobile-Former with 5.8M parameters and 285M FLOPs, QB-Heat is on par in linear probing on ImageNet, but clearly outperforms in non-linear probing that adds a transformer block before linear classifier (65.6% vs. 52.9%). When transferring to object detection with frozen backbone, QB-Heat outperforms MoCo-v2 and supervised pre-training on ImageNet by 7.9 and 4.5 AP respectively. This work provides an insightful hypothesis on the invariance within visual representation over different shapes and textures: the linear relationship between horizontal and vertical derivatives. The code will be publicly released.
- North America > Mexico > Gulf of Mexico (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Oregon > Marion County > Four Corners (0.04)
- (2 more...)